1 Preface

1.1 Background

In this part, the deep learning approach is used to classify the image whether the presented image is beach, forest, or mountain. Usually, deep learning approach can be implemented in image classificatio, image detection , facial recognition, and so on.

1.2 Data Source


Image Classification

1.3 Packages List

# Data wrangling
library(tidyverse)

# Image manipulation
library(imager)

# Deep learning
# devtools::install_github("rstudio/keras")
library(keras)
# install_keras(method = c("conda"))
library(tensorflow)

# Model Evaluation
library(caret)

options(scipen = 999)

1.4 Setup

You need to install the pillow package in your conda environment to manipulate image data. Here is the short instruction on how to create a new conda environment with tensorflow and pillow inside it.

1.Open the terminal, either in anaconda command prompt or directly in RStudio.

2.Create new conda environment by running the following command. conda create -n r-tensorflow python=3.9

3.Active the conda environment by running the following command. conda activate r-tensorflow

4.Install the tensorflow package into the environment. conda install -c conda-forge tensorflow=2.12.0 or conda install tensorflow=2.12.0

5.Install the pillow package. conda install pillow


2 EDA

2.1 Separate Folder

In image classification problem, it is a common practice to put each image on separate folders based on the target class/labels. For example, inside the train folder in our data, you can that we have 3 different folders, respectively for beach, forest, or mountain.

Let’s try to get the file name of each image. First, find the folder of each target class. Then, the code below helps to get the folder name inside the train folder.

folder_list <- list.files("data/train/")

folder_list
#> [1] "beach"    "forest"   "mountain"

Combining the folder name with the path or directory of the train folder in order to access the content inside each folder.

folder_path <- paste0("data/train/", folder_list, "/")

folder_path
#> [1] "data/train/beach/"    "data/train/forest/"   "data/train/mountain/"

Using the map() function to loop or iterate and collect the file name for each folder (beach, forest, mountain). The map() will return a list, so using unlist() function to combine the file name from 3 different folders.

# Get file name
file_name <- map(folder_path, 
                 function(x) paste0(x, list.files(x))
                 ) %>% 
  unlist()

# first 6 file name
head(file_name)
#> [1] "data/train/beach/beach_100.jpeg" "data/train/beach/beach_101.jpeg"
#> [3] "data/train/beach/beach_102.jpeg" "data/train/beach/beach_103.jpeg"
#> [5] "data/train/beach/beach_104.jpeg" "data/train/beach/beach_105.jpeg"

Check the last 6 images by using tail function.

# last 6 file name
tail(file_name)
#> [1] "data/train/mountain/mountain_a_45.jpeg"
#> [2] "data/train/mountain/mountian_115.jpeg" 
#> [3] "data/train/mountain/mountian_45.jpeg"  
#> [4] "data/train/mountain/mountian_72.jpeg"  
#> [5] "data/train/mountain/mountian_74.jpeg"  
#> [6] "data/train/mountain/mountian_95.jpeg"

Let’s check how many images in file_name folder.

length(file_name)
#> [1] 1328

To check the content of the file, we can use the Using load.image() function from the imager package, to check the content of the file. Let’s randomly visualize 6 images from the data.

# Randomly select image
set.seed(91)
sample_image <- sample(file_name, 6)

# Load image into R
img <- map(sample_image, load.image)

# Plot image
par(mfrow = c(2, 3)) # Create 2 x 3 image grid
invisible(map(img, plot))


2.2 Image Dimension

In this part, the important of image classification is understand the distribution of the image dimension to create a proper input dimension for building the deep learning model.

# Full image description

image <- load.image(file_name[1])
image
#> Image. Width: 279 pix Height: 181 pix Depth: 1 Colour channels: 3

The result above shows the dimension of the image. The height and width represent the height and width of the image in pixels. The color channel represent if the color is in gray scale format (color channels = 1) or is in RGB format (color channels = 3). Using the dim() function to get value of each dimension.

dim(image)
#> [1] 279 181   1   3

The result above shows that the height score = 279, width score = 181, depth score = 1, and the channels = 3 (RGB).

Creating a function that will instantly get the height and width of an image and convert it into a data.frame.

# Function for acquiring width and height of an image
get_dim <- function(x){
  img <- load.image(x) 
  
  df_img <- data.frame(height = height(img),
                       width = width(img),
                       filename = x
                       )
  
  return(df_img)
}

# Implementation

file_dim <- map_df(file_name, get_dim)
file_dim

The result above shows the converted result from the images’ height and width into the data frame where each images has different size of height and size of width.

summary(file_dim)
#>      height          width         filename        
#>  Min.   : 94.0   Min.   :100.0   Length:1328       
#>  1st Qu.:168.0   1st Qu.:268.0   Class :character  
#>  Median :183.0   Median :275.0   Mode  :character  
#>  Mean   :178.2   Mean   :282.9                     
#>  3rd Qu.:184.0   3rd Qu.:300.0                     
#>  Max.   :314.0   Max.   :534.0

After conducting summary function, it can be concluded that the quality of image is medium to low and the 50% of datas are located in height (168 - 184 pxl) and width (268 - 300 pxl). And, the maximum of image height is 314 pxl and the minimum of image height is 94 pxl. Then, the maximum of image width is 534 pxl and the minimum of image height is 100 pxl.


3 Data Preprocessing

3.1 Image Augmentation

Determining the input image dimension size for the deep learning model which based on summary(file_dim). All input images should have the same dimensions. Then, transforming all image into 128 X 128 pixels dimension size. The more higher image dimension size, it takes longer time to train the data. But, if the image size is too small, a lot of information will lost. Furthermore, setting the batch size with 24 for the data so the model will be updated every time it finished training.

# Desired height and width of images
target_size <- c(128,128)

# Batch size for training the model
batch_size <- 15 

Due to the amount of data train set is small, then using the image_data_generator function from keras package for Image Augmentation to leverage the amount of data train set without acquiring new images. In data augmentation the image will be modified (e.g. flip, rotate, zoom, crop, de-texturized, de-colorized, edge-enhanced, salient edge map, etc.) so that the model can learn very well.

Creating the image generator for keras with the following properties:

  • Scaling the pixel value by dividing the pixel value by 255
  • set width_shift_range with 0.2
  • set height_shift_range with 0.2
  • Zoom in or zoom out by 20%
  • brightness range with 1-2
  • fill_mode with “nearest”
  • validation_split with 0.2

You can explore more features about the image generator on this link.

# Image Generator
train_data_gen <-
  image_data_generator(rescale = 1/255,
                                       horizontal_flip = T,
                                       width_shift_range = 0.2,
                                       height_shift_range = 0.2,
                                       zoom_range = 0.2,
                                       brightness_range = c(1,2),
                                       fill_mode = "nearest",
                                       validation_split = 0.2)

The generator is applied by using the flow_images_from_directory(). The data is located inside the data folder and inside the train folder, so the directory will be data/train. From this process, getting the augmented image both for training data and the validation data through splitting 80% for train data and 20% for test/validation data. The train data set will be used to train and evaluate the model, while the test data set is used for the final evaluation.

# Training Dataset
train_image_array_gen <-
  flow_images_from_directory(
    directory = "data/train/",
    # Folder of the data
    target_size = target_size,
    # target of the image dimension (64 x 64)
    color_mode = "rgb",
    # use RGB color
    batch_size = batch_size ,
    seed = 123,
    # set random seed
    subset = "training",
    # declare that this is for training data
    generator = train_data_gen
  )

# Validation Dataset
val_image_array_gen <-
  flow_images_from_directory(
    directory = "data/train/",
    target_size = target_size,
    color_mode = "rgb",
    batch_size = batch_size ,
    seed = 123,
    subset = "validation",
    # declare that this is the validation data
    generator = train_data_gen
  )

Checking the class proportion of the train data set. The index correspond to each labels of the target variable and ordered alphabetically (beach, forest, mountain).

# Number of training samples

train_sample <- train_image_array_gen$n

# Number of validation samples

valid_sample <- val_image_array_gen$n

# Number of target classes/categories
output_n <- n_distinct(train_image_array_gen$classes)

# Get the class proportion
# train
table("\nFrequency" = factor(train_image_array_gen$classes)
      ) %>% 
  prop.table()
#> 
#> Frequency
#>         0         1         2 
#> 0.3214286 0.3045113 0.3740602
# test
table("\nFrequency" = factor(val_image_array_gen$classes)
      ) %>% 
  prop.table()
#> 
#> Frequency
#>         0         1         2 
#> 0.3219697 0.3030303 0.3750000

After checking the image class proportion, it can be concluded that the image class proportion is balanced on the train data set and test data set.


4 Convolutional Neural Network

The Convolutional Neural Network (CNN) or Convolutional Layer is a popular layer for image classification. The image contains 2 dimensional array which are height and width also 3 channels of RGB (Red, Green, Blue) information. And the computer can understand / process when the image has already convert to pixels and change the architecture of a neural network in a way to take advantage of this structure.


5 Model Architecture

CNNs transform the input data from the input layer through all connected layers into a set of class scores given by the output layer. The feature-extraction layers have a general repeating pattern of the sequence:

1. Convolution layer

Express the Rectified Linear Unit (ReLU) activation function as a layer in the diagram here to match up to other literature.

2. Pooling layer

These layers find a number of features in the images and progressively construct higher-order features. This corresponds directly to the ongoing theme in deep learning by which features are automatically learned as opposed to traditionally hand engineered.

Finally we have the classification layers in which we have one or more fully connected layers to take the higher-order features and produce class probabilities or scores. These layers are fully connected to all of the neurons in the previous layer, as their name implies. The output of these layers produces typically a two-dimensional output of the dimensions [b × N], where b is the number of examples in the mini-batch and N is the number of classes we’re interested in scoring.

Let’s build a simple model first with the following layer:

  • Convolutional layer to extract features from 2D image with relu activation function.
  • Max Pooling layer to downsample the image features.
  • Flattening layer to flatten data from 2D array to 1D array.
  • Dense layer to capture more information.
  • Dense layer for output with softmax activation function.

Don’t forget to set the input size in the first layer. If the input image is in RGB, set the final number to 3, which is the number of color channels. If the input image is in grayscale, set the final number to 1.

# input shape of the image
c(target_size, 3) 
#> [1] 128 128   3
# Set Initial Random Weight
# tensorflow::tf$random$set_seed(121)
tensorflow::set_random_seed(105)
model <- keras_model_sequential(name = "simple_model") %>% 
  
  # Convolution Layer
  layer_conv_2d(filters = 32,
                kernel_size = c(3,3),
                padding = "same",
                activation = "relu",
                input_shape = c(target_size, 3) 
                ) %>% 

  # Max Pooling Layer
  layer_max_pooling_2d(pool_size = c(2,2)) %>% 
  
  # Flattening Layer
  layer_flatten() %>% 
  
  # Dense Layer
  layer_dense(units = 1000,
              activation = "relu") %>% 
  
  # Output Layer
  layer_dense(units = output_n,
              activation = "softmax",
              name = "Output")
  
model
#> Model: "simple_model"
#> ________________________________________________________________________________
#>  Layer (type)                       Output Shape                    Param #     
#> ================================================================================
#>  conv2d (Conv2D)                    (None, 128, 128, 32)            896         
#>  max_pooling2d (MaxPooling2D)       (None, 64, 64, 32)              0           
#>  flatten (Flatten)                  (None, 131072)                  0           
#>  dense (Dense)                      (None, 1000)                    131073000   
#>  Output (Dense)                     (None, 3)                       3003        
#> ================================================================================
#> Total params: 131,076,899
#> Trainable params: 131,076,899
#> Non-trainable params: 0
#> ________________________________________________________________________________

6 Model Fitting

After compiling the model by specifying the loss function and optimizer, then the model is fitted by validation data after the model is fitted by train data. In this step, using 40 epochs, then 0.001 learning rate, and categorical cross-entropy as the loss function also using sgd optimizer.

model %>% 
  compile(
    loss = "categorical_crossentropy",
    optimizer = optimizer_sgd(learning_rate = 0.001),
    metrics = "accuracy"
  )

# Fit data into model
history <- model %>% 
  fit_generator(
  # training data
  train_image_array_gen, 

  # training epochs
  steps_per_epoch = as.integer(train_sample / batch_size), 
  epochs = 40, 
  
  # validation data
  validation_data = val_image_array_gen, 
  validation_steps = as.integer(valid_sample / batch_size))

plot(history)


7 Untuned Model Evaluation

Before the model is evaluated with confusion matrix, It is required to get the file name of the image that is used as the data validation by extracting the categorical label as the actual value of the target variable.

val_data_untuned <- data.frame(file_name = paste0("data/train/", val_image_array_gen$filenames)) %>% 
  mutate(class = str_extract(file_name, "beach|forest|mountain"))

head(val_data_untuned, 100)

Obtaining the image by converting it into an array, due to the image has 2 dimensions and 3 color channels (RGB). Then, using the model to predict the original image from the folder but doesn’t use the image generator since it will transform the image and does not reflect the actual image.

# Function to convert image to array
image_prep <- function(x) {
  arrays <- lapply(x, function(path) {
    img <- image_load(path, target_size = target_size, 
                      grayscale = F # Set FALSE if image is RGB
                      )
    
    x <- image_to_array(img)
    x <- array_reshape(x, c(1, dim(x)))
    x <- x/255 # rescale image pixel
  })
  do.call(abind::abind, c(arrays, list(along = 1)))
}
test_x_untuned <- image_prep(val_data_untuned$file_name)
dim(test_x_untuned)
#> [1] 264 128 128   3
pred_test_untuned <- predict(model, test_x_untuned) %>% 
  k_argmax() %>% # for taking the highest probability
  as.array() %>% 
  as.factor()


head(pred_test_untuned, 10)
#>  [1] 0 0 0 0 0 0 0 0 0 0
#> Levels: 0 1 2

Converting the encoding into class label for getting easier interpretation.

# Convert encoding to label
decode <- function(x){
  case_when(x == 0 ~ "beach",
            x == 1 ~ "forest",
            x == 2 ~ "mountain"
            )
}

pred_test_untuned <- sapply(pred_test_untuned, decode) 

pred_test_untuned
#>   [1] "beach"    "beach"    "beach"    "beach"    "beach"    "beach"   
#>   [7] "beach"    "beach"    "beach"    "beach"    "beach"    "beach"   
#>  [13] "beach"    "beach"    "beach"    "mountain" "beach"    "beach"   
#>  [19] "beach"    "mountain" "beach"    "beach"    "mountain" "beach"   
#>  [25] "beach"    "beach"    "beach"    "mountain" "beach"    "beach"   
#>  [31] "beach"    "mountain" "mountain" "beach"    "mountain" "beach"   
#>  [37] "beach"    "beach"    "beach"    "beach"    "beach"    "mountain"
#>  [43] "beach"    "beach"    "beach"    "beach"    "beach"    "mountain"
#>  [49] "beach"    "beach"    "beach"    "beach"    "mountain" "beach"   
#>  [55] "mountain" "beach"    "beach"    "mountain" "beach"    "beach"   
#>  [61] "mountain" "beach"    "beach"    "mountain" "beach"    "beach"   
#>  [67] "mountain" "beach"    "mountain" "beach"    "beach"    "mountain"
#>  [73] "mountain" "beach"    "beach"    "beach"    "beach"    "beach"   
#>  [79] "beach"    "beach"    "beach"    "forest"   "beach"    "beach"   
#>  [85] "beach"    "forest"   "forest"   "forest"   "forest"   "forest"  
#>  [91] "forest"   "forest"   "forest"   "forest"   "forest"   "forest"  
#>  [97] "forest"   "forest"   "mountain" "forest"   "forest"   "forest"  
#> [103] "forest"   "forest"   "forest"   "mountain" "mountain" "forest"  
#> [109] "forest"   "forest"   "forest"   "forest"   "forest"   "mountain"
#> [115] "forest"   "forest"   "forest"   "forest"   "forest"   "forest"  
#> [121] "forest"   "mountain" "forest"   "forest"   "forest"   "forest"  
#> [127] "forest"   "forest"   "forest"   "forest"   "mountain" "forest"  
#> [133] "forest"   "forest"   "forest"   "forest"   "forest"   "forest"  
#> [139] "forest"   "forest"   "forest"   "forest"   "forest"   "mountain"
#> [145] "forest"   "forest"   "forest"   "forest"   "forest"   "forest"  
#> [151] "forest"   "forest"   "forest"   "forest"   "forest"   "forest"  
#> [157] "forest"   "forest"   "mountain" "forest"   "forest"   "mountain"
#> [163] "forest"   "forest"   "forest"   "mountain" "mountain" "forest"  
#> [169] "mountain" "mountain" "mountain" "beach"    "mountain" "mountain"
#> [175] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [181] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [187] "mountain" "forest"   "mountain" "mountain" "mountain" "mountain"
#> [193] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [199] "mountain" "mountain" "mountain" "mountain" "forest"   "mountain"
#> [205] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [211] "forest"   "mountain" "mountain" "mountain" "mountain" "mountain"
#> [217] "mountain" "forest"   "mountain" "mountain" "mountain" "forest"  
#> [223] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [229] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [235] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [241] "beach"    "mountain" "mountain" "mountain" "mountain" "mountain"
#> [247] "mountain" "mountain" "mountain" "forest"   "mountain" "beach"   
#> [253] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [259] "mountain" "mountain" "mountain" "mountain" "beach"    "mountain"

Evaluating the model using the confusion matrix.

confusionMatrix(data=as.factor(pred_test_untuned), reference = as.factor(val_data_untuned$class))
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction beach forest mountain
#>   beach       66      0        4
#>   forest       1     71        7
#>   mountain    18      9       88
#> 
#> Overall Statistics
#>                                               
#>                Accuracy : 0.8523              
#>                  95% CI : (0.8036, 0.8928)    
#>     No Information Rate : 0.375               
#>     P-Value [Acc > NIR] : < 0.0000000000000002
#>                                               
#>                   Kappa : 0.7764              
#>                                               
#>  Mcnemar's Test P-Value : 0.01726             
#> 
#> Statistics by Class:
#> 
#>                      Class: beach Class: forest Class: mountain
#> Sensitivity                0.7765        0.8875          0.8889
#> Specificity                0.9777        0.9565          0.8364
#> Pos Pred Value             0.9429        0.8987          0.7652
#> Neg Pred Value             0.9021        0.9514          0.9262
#> Prevalence                 0.3220        0.3030          0.3750
#> Detection Rate             0.2500        0.2689          0.3333
#> Detection Prevalence       0.2652        0.2992          0.4356
#> Balanced Accuracy          0.8771        0.9220          0.8626

8 Tuning Model

8.1 Show The Untuned Model

model
#> Model: "simple_model"
#> ________________________________________________________________________________
#>  Layer (type)                       Output Shape                    Param #     
#> ================================================================================
#>  conv2d (Conv2D)                    (None, 128, 128, 32)            896         
#>  max_pooling2d (MaxPooling2D)       (None, 64, 64, 32)              0           
#>  flatten (Flatten)                  (None, 131072)                  0           
#>  dense (Dense)                      (None, 1000)                    131073000   
#>  Output (Dense)                     (None, 3)                       3003        
#> ================================================================================
#> Total params: 131,076,899
#> Trainable params: 131,076,899
#> Non-trainable params: 0
#> ________________________________________________________________________________

Let’s recall the untuned model architecture. Next, the first CNN only extract the image features, then being downsampled by using max_poling2D layer. But the output still has 64 X 64 array, so that many informations don’t extract yet before flattening the data. Therefore, adding some layers into model for capturing more informations/features of image.


8.2 Image Augmentation

In this tuning stage, the amount of batch size is modified into 8, and target size is modified into 128 x 128 pxl.

# Desired height and width of images
target_size_tuned <- c(128,128)

# Batch size for training the model
batch_size_tuned <- 8
# Image Generator
train_data_gen_tuned <- image_data_generator(rescale = 1/255,
                                       horizontal_flip = T,
                                       width_shift_range = 0.2,
                                       height_shift_range = 0.2,
                                       zoom_range = 0.2,
                                       brightness_range = c(1,2),
                                       fill_mode = "nearest",
                                       validation_split = 0.2)

The generator is applied by using the flow_images_from_directory(). The data is located inside the data folder and inside the train folder, so the directory will be data/train. From this process, getting the augmented image both for training data and the validation data through splitting 80% for train data and 20% for test/validation data. The train data set will be used to train and evaluate the model, while the test data set is used for the final evaluation.

# Training Dataset
train_image_array_gen_tuned <- flow_images_from_directory(directory = "data/train/", # Folder of the data
                                                    target_size = target_size_tuned, # target of the image dimension (64 x 64)  
                                                    color_mode = "rgb", # use RGB color
                                                    batch_size = batch_size_tuned , 
                                                    seed = 143,  # set random seed
                                                    subset = "training", # declare that this is for training data
                                                    generator = train_data_gen_tuned
                                                    )

# Validation Dataset
val_image_array_gen_tuned <- flow_images_from_directory(directory = "data/train/",
                                                  target_size = target_size_tuned, 
                                                  color_mode = "rgb", 
                                                  batch_size = batch_size_tuned ,
                                                  seed = 143,
                                                  subset = "validation", # declare that this is the validation data
                                                  generator = train_data_gen_tuned
                                                  )
# Number of training samples

train_sample_tuned <- train_image_array_gen_tuned$n

# Number of validation samples

valid_sample_tuned <- val_image_array_gen_tuned$n

# Number of target classes/categories
output_n_tuned <- n_distinct(train_image_array_gen_tuned$classes)

# Get the class proportion
# train
table("\nFrequency" = factor(train_image_array_gen_tuned$classes)
      ) %>% 
  prop.table()
#> 
#> Frequency
#>         0         1         2 
#> 0.3214286 0.3045113 0.3740602

8.3 Tuning Model Architecture

Let’s build a tuning model architecture as a model improved with the following layer:

  • 1st Convolutional layer to extract features from 2D image with relu activation function and 32 filters
  • Max pooling layer 1 to downsample the image features
  • 2nd Convolutional layer to extract features from 2D image with relu activation function and 64 filters
  • Max pooling layer 2 to downsample the image features
  • 3rd Convolution Layer to extract features from 2D image with relu activation function and 128 filters
  • Max pooling layer 3 to downsample the image features
  • 4th Convolution Layer to extract features from 2D image with relu activation function and 256 filters
  • Max pooling layer 4 to downsample the image features
  • 5th Convolution Layer to extract features from 2D image with relu activation function and 256 filters
  • Max pooling layer 5 to downsample the image features
  • Flattening layer to convert from 2D array to 1D array
  • Dense layer 1 to capture more information with relu activation
  • Dense layer for output layer with softmax activation

Next, the input_shape has a number of 3, which indicates the image input is RGB. If the image input is grey scale, setting the last number of input_shape with a number of 1.

# Set Initial Random Weight
# tensorflow::tf$random$set_seed(121)
tensorflow::set_random_seed(107)
model_tuned <- keras_model_sequential(name = "model_tuned") %>% 
  
  # 1st Convolution Layer
  layer_conv_2d(filters = 32,
                kernel_size = c(3,3),
                padding = "same",
                activation = "relu",
                input_shape = c(target_size_tuned, 3) 
                ) %>% 
  
  # Max Pooling Layer 1
  layer_max_pooling_2d(pool_size = c(2,2)) %>% 
  
  # 2nd Convolution Layer
  layer_conv_2d(filters = 64,
                kernel_size = c(3,3),
                padding = "same",
                activation = "relu",
                input_shape = c(target_size_tuned, 3) 
                ) %>% 

  # Max Pooling Layer 2
  layer_max_pooling_2d(pool_size = c(2,2)) %>% 
  
  # 3rd Convolution Layer
  layer_conv_2d(filters = 128,
                kernel_size = c(3,3),
                padding = "same",
                activation = "relu",
                input_shape = c(target_size_tuned, 3) 
                ) %>% 
  
  # Max Pooling Layer 3
  layer_max_pooling_2d(pool_size = c(2,2)) %>% 
  
   # 4th Convolutional layer
  layer_conv_2d(filters = 256,
                kernel_size = c(3,3),
                padding = "same",
                activation = "relu"
                ) %>%

  # Max pooling layer 4
  layer_max_pooling_2d(pool_size = c(2,2)) %>%
  
  # 5th Convolutional layer
  layer_conv_2d(filters = 256,
                kernel_size = c(3,3),
                padding = "same",
                activation = "relu"
                ) %>%

  # Max pooling layer 5
  layer_max_pooling_2d(pool_size = c(2,2)) %>%

  
  # Flattening Layer
  layer_flatten() %>% 
  
  # Dense Layer 1
  layer_dense(units = 128,
              activation = "relu") %>% 
  
  # Output Layer
  layer_dense(units = output_n_tuned,
              activation = "softmax",
              name = "Output")
  
model_tuned
#> Model: "model_tuned"
#> ________________________________________________________________________________
#>  Layer (type)                       Output Shape                    Param #     
#> ================================================================================
#>  conv2d_5 (Conv2D)                  (None, 128, 128, 32)            896         
#>  max_pooling2d_5 (MaxPooling2D)     (None, 64, 64, 32)              0           
#>  conv2d_4 (Conv2D)                  (None, 64, 64, 64)              18496       
#>  max_pooling2d_4 (MaxPooling2D)     (None, 32, 32, 64)              0           
#>  conv2d_3 (Conv2D)                  (None, 32, 32, 128)             73856       
#>  max_pooling2d_3 (MaxPooling2D)     (None, 16, 16, 128)             0           
#>  conv2d_2 (Conv2D)                  (None, 16, 16, 256)             295168      
#>  max_pooling2d_2 (MaxPooling2D)     (None, 8, 8, 256)               0           
#>  conv2d_1 (Conv2D)                  (None, 8, 8, 256)               590080      
#>  max_pooling2d_1 (MaxPooling2D)     (None, 4, 4, 256)               0           
#>  flatten_1 (Flatten)                (None, 4096)                    0           
#>  dense_1 (Dense)                    (None, 128)                     524416      
#>  Output (Dense)                     (None, 3)                       387         
#> ================================================================================
#> Total params: 1,503,299
#> Trainable params: 1,503,299
#> Non-trainable params: 0
#> ________________________________________________________________________________

8.4 Tuning Model Fitting

In tuning model fitting, the learning rate is modified from 0.001 to 0.01 total epochs decreased to 35 out of 40. Then, using categorical crossentropy as loss function and using sgd optimizer.

model_tuned %>% 
  compile(
    loss = "categorical_crossentropy",
    optimizer = optimizer_sgd(learning_rate = 0.01),
    metrics = "accuracy"
  )

# Fit data into model
history_tuned <- model_tuned %>% 
  fit_generator(
  # training data
  train_image_array_gen_tuned, 

  # training epochs
  steps_per_epoch = as.integer(train_sample_tuned / batch_size_tuned), 
  epochs = 35, 
  
  # validation data
  validation_data = val_image_array_gen_tuned, 
  validation_steps = as.integer(valid_sample_tuned / batch_size_tuned))
  # ,

 # print progress but don't create graphic
  # verbose = 1,
  # view_metrics = 0)
  
plot(history_tuned)

The plot above shows that the model_tuned is good enough where the model has the lost close to zero also small differences between accuracy on train data and validation data. It means that the model is not underfit or even overfit.


8.5 Model Evaluation

The model is evaluated through the confusion matrix using the validation data from the generator. First, the file name of the image is used as data validation. Then, the file name is extracted the categorical label as the target variable.

val_data <- data.frame(file_name = paste0("data/train/", val_image_array_gen_tuned$filenames)) %>% 
  mutate(class = str_extract(file_name, "beach|forest|mountain"))

head(val_data, 100)

Obtaining the image by converting it into an array, due to the image has 2 dimensions and 3 color channels (RGB). Then, using the model to predict the original image from the folder but doesn’t use the image generator since it will transform the image and does not reflect the actual image.

# Function to convert image to array
image_prep_tuned <- function(x) {
  arrays <- lapply(x, function(path) {
    img <- image_load(path, target_size = target_size_tuned, 
                      grayscale = F # Set FALSE if image is RGB
                      )
    
    x <- image_to_array(img)
    x <- array_reshape(x, c(1, dim(x)))
    x <- x/255 # rescale image pixel
  })
  do.call(abind::abind, c(arrays, list(along = 1)))
}
test_x <- image_prep_tuned(val_data$file_name)
dim(test_x)
#> [1] 264 128 128   3

Furthermore, evaluating the data and acquire the confusion matrix for the validation data.

pred_test <- predict(model_tuned, test_x) %>% 
  k_argmax() %>% # xuntuk mengambil nilai probability paling besar
  as.array() %>% 
  as.factor()


head(pred_test, 10)
#>  [1] 0 0 0 0 0 0 0 0 0 0
#> Levels: 0 1 2

Converting the encoding into class label for getting easier interpretation.

# Convert encoding to label
decode <- function(x){
  case_when(x == 0 ~ "beach",
            x == 1 ~ "forest",
            x == 2 ~ "mountain"
            )
}

pred_test <- sapply(pred_test, decode) 

pred_test
#>   [1] "beach"    "beach"    "beach"    "beach"    "beach"    "beach"   
#>   [7] "beach"    "beach"    "beach"    "beach"    "beach"    "beach"   
#>  [13] "beach"    "beach"    "beach"    "beach"    "beach"    "beach"   
#>  [19] "beach"    "mountain" "beach"    "beach"    "beach"    "beach"   
#>  [25] "beach"    "beach"    "beach"    "beach"    "beach"    "beach"   
#>  [31] "beach"    "mountain" "beach"    "beach"    "mountain" "beach"   
#>  [37] "beach"    "beach"    "beach"    "beach"    "beach"    "beach"   
#>  [43] "beach"    "beach"    "beach"    "beach"    "beach"    "beach"   
#>  [49] "beach"    "beach"    "beach"    "beach"    "mountain" "beach"   
#>  [55] "beach"    "beach"    "beach"    "mountain" "beach"    "beach"   
#>  [61] "beach"    "beach"    "beach"    "beach"    "beach"    "beach"   
#>  [67] "beach"    "beach"    "beach"    "beach"    "beach"    "beach"   
#>  [73] "beach"    "beach"    "beach"    "beach"    "beach"    "beach"   
#>  [79] "beach"    "beach"    "beach"    "beach"    "beach"    "beach"   
#>  [85] "beach"    "forest"   "forest"   "forest"   "forest"   "forest"  
#>  [91] "forest"   "forest"   "forest"   "forest"   "forest"   "forest"  
#>  [97] "forest"   "forest"   "forest"   "forest"   "forest"   "forest"  
#> [103] "forest"   "forest"   "forest"   "forest"   "mountain" "forest"  
#> [109] "forest"   "forest"   "forest"   "forest"   "forest"   "mountain"
#> [115] "forest"   "forest"   "forest"   "forest"   "forest"   "forest"  
#> [121] "forest"   "forest"   "forest"   "forest"   "forest"   "forest"  
#> [127] "forest"   "forest"   "forest"   "forest"   "forest"   "forest"  
#> [133] "forest"   "forest"   "forest"   "forest"   "forest"   "forest"  
#> [139] "forest"   "forest"   "forest"   "forest"   "forest"   "mountain"
#> [145] "forest"   "forest"   "forest"   "forest"   "forest"   "forest"  
#> [151] "forest"   "forest"   "forest"   "forest"   "forest"   "forest"  
#> [157] "forest"   "forest"   "forest"   "forest"   "forest"   "forest"  
#> [163] "forest"   "forest"   "forest"   "mountain" "mountain" "forest"  
#> [169] "beach"    "mountain" "beach"    "beach"    "mountain" "mountain"
#> [175] "mountain" "mountain" "mountain" "mountain" "beach"    "mountain"
#> [181] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [187] "beach"    "forest"   "mountain" "mountain" "beach"    "mountain"
#> [193] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [199] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [205] "mountain" "mountain" "mountain" "beach"    "mountain" "mountain"
#> [211] "forest"   "mountain" "mountain" "mountain" "mountain" "mountain"
#> [217] "mountain" "forest"   "mountain" "mountain" "mountain" "forest"  
#> [223] "mountain" "mountain" "mountain" "beach"    "mountain" "mountain"
#> [229] "mountain" "beach"    "mountain" "forest"   "mountain" "mountain"
#> [235] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [241] "beach"    "mountain" "mountain" "mountain" "mountain" "mountain"
#> [247] "mountain" "mountain" "mountain" "mountain" "mountain" "beach"   
#> [253] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [259] "mountain" "mountain" "mountain" "mountain" "beach"    "mountain"

Evaluating the model using the confusion matrix. This model perform better than the previous model because we put more CNN layer to extract more features from the image.

confusionMatrix(data=as.factor(pred_test), reference = as.factor(val_data$class))
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction beach forest mountain
#>   beach       80      0       12
#>   forest       0     77        6
#>   mountain     5      3       81
#> 
#> Overall Statistics
#>                                                
#>                Accuracy : 0.9015               
#>                  95% CI : (0.859, 0.9347)      
#>     No Information Rate : 0.375                
#>     P-Value [Acc > NIR] : < 0.00000000000000022
#>                                                
#>                   Kappa : 0.8521               
#>                                                
#>  Mcnemar's Test P-Value : NA                   
#> 
#> Statistics by Class:
#> 
#>                      Class: beach Class: forest Class: mountain
#> Sensitivity                0.9412        0.9625          0.8182
#> Specificity                0.9330        0.9674          0.9515
#> Pos Pred Value             0.8696        0.9277          0.9101
#> Neg Pred Value             0.9709        0.9834          0.8971
#> Prevalence                 0.3220        0.3030          0.3750
#> Detection Rate             0.3030        0.2917          0.3068
#> Detection Prevalence       0.3485        0.3144          0.3371
#> Balanced Accuracy          0.9371        0.9649          0.8848

The model_tuned confusion matrix result shows that the model can classify beach, forest, mountain with the accuracy 90.15% on the validation data.


9 Predict Data in Testing Dataset

In order the tuning model can be evaluated using the confusion matrix, the contents of the folder for the test data will be divided into three class folders (beach, forest, mountain) and the images are categorized according to their respective folders. Then, extracting the categorical label as the actual value of the target variable.

# Validation Dataset
test_image_array_gen_tuned <- flow_images_from_directory(directory = "data/test/",
                                                  target_size = target_size_tuned, 
                                                  color_mode = "rgb", 
                                                  batch_size = batch_size_tuned ,
                                                  seed = 173,
                                                  generator = train_data_gen_tuned
                                                  )
# Extract
val_test_data <- data.frame(file_name = paste0("data/test/", test_image_array_gen_tuned$filenames)) %>% 
  mutate(class = str_extract(file_name, "beach|forest|mountain"))

val_test_data

Obtaining the image by converting it into an array, due to the image has 2 dimensions and 3 color channels (RGB). Then, using the model to predict the original image from the folder but doesn’t use the image generator since it will transform the image and does not reflect the actual image.

# Function to convert image to array
image_prep_tuned <- function(x) {
  arrays <- lapply(x, function(path) {
    img <- image_load(path, target_size = target_size_tuned, 
                      grayscale = F # Set FALSE if image is RGB
                      )
    
    x <- image_to_array(img)
    x <- array_reshape(x, c(1, dim(x)))
    x <- x/255 # rescale image pixel
  })
  do.call(abind::abind, c(arrays, list(along = 1)))
}
test1_x <- image_prep_tuned(val_test_data$file_name)
dim(test1_x)
#> [1] 294 128 128   3

Furthermore, evaluating the data and acquire the confusion matrix for the validation data.

pred1_test <- predict(model_tuned, test1_x) %>% 
  k_argmax() %>% # xuntuk mengambil nilai probability paling besar
  as.array() %>% 
  as.factor()


head(pred1_test, 10)
#>  [1] 0 2 0 0 0 0 0 0 0 0
#> Levels: 0 1 2

Converting the encoding into class label for getting easier interpretation.

# Convert encoding to label
decode <- function(x){
  case_when(x == 0 ~ "beach",
            x == 1 ~ "forest",
            x == 2 ~ "mountain"
            )
}

pred1_test <- sapply(pred1_test, decode) 

pred1_test
#>   [1] "beach"    "mountain" "beach"    "beach"    "beach"    "beach"   
#>   [7] "beach"    "beach"    "beach"    "beach"    "beach"    "beach"   
#>  [13] "mountain" "beach"    "beach"    "beach"    "beach"    "beach"   
#>  [19] "beach"    "mountain" "beach"    "beach"    "beach"    "mountain"
#>  [25] "beach"    "beach"    "beach"    "beach"    "beach"    "mountain"
#>  [31] "beach"    "beach"    "beach"    "beach"    "beach"    "mountain"
#>  [37] "forest"   "beach"    "beach"    "beach"    "beach"    "beach"   
#>  [43] "beach"    "beach"    "beach"    "beach"    "beach"    "beach"   
#>  [49] "beach"    "beach"    "beach"    "beach"    "forest"   "beach"   
#>  [55] "beach"    "beach"    "beach"    "beach"    "beach"    "beach"   
#>  [61] "beach"    "beach"    "beach"    "beach"    "beach"    "beach"   
#>  [67] "beach"    "beach"    "beach"    "beach"    "beach"    "beach"   
#>  [73] "beach"    "beach"    "beach"    "beach"    "beach"    "beach"   
#>  [79] "beach"    "beach"    "beach"    "beach"    "mountain" "beach"   
#>  [85] "beach"    "beach"    "beach"    "beach"    "beach"    "beach"   
#>  [91] "beach"    "beach"    "beach"    "beach"    "beach"    "beach"   
#>  [97] "beach"    "beach"    "beach"    "beach"    "beach"    "beach"   
#> [103] "mountain" "forest"   "beach"    "beach"    "beach"    "beach"   
#> [109] "beach"    "mountain" "forest"   "forest"   "forest"   "forest"  
#> [115] "forest"   "forest"   "forest"   "forest"   "forest"   "mountain"
#> [121] "forest"   "forest"   "forest"   "forest"   "forest"   "forest"  
#> [127] "forest"   "mountain" "forest"   "forest"   "mountain" "forest"  
#> [133] "forest"   "forest"   "forest"   "forest"   "mountain" "mountain"
#> [139] "forest"   "forest"   "forest"   "forest"   "forest"   "forest"  
#> [145] "forest"   "forest"   "forest"   "forest"   "forest"   "forest"  
#> [151] "forest"   "forest"   "forest"   "forest"   "forest"   "mountain"
#> [157] "forest"   "forest"   "mountain" "forest"   "forest"   "forest"  
#> [163] "forest"   "forest"   "forest"   "forest"   "forest"   "forest"  
#> [169] "forest"   "forest"   "forest"   "forest"   "forest"   "forest"  
#> [175] "mountain" "forest"   "forest"   "forest"   "forest"   "forest"  
#> [181] "forest"   "forest"   "forest"   "forest"   "forest"   "forest"  
#> [187] "mountain" "forest"   "forest"   "forest"   "forest"   "forest"  
#> [193] "forest"   "forest"   "beach"    "mountain" "mountain" "mountain"
#> [199] "mountain" "mountain" "mountain" "beach"    "mountain" "mountain"
#> [205] "mountain" "mountain" "beach"    "mountain" "beach"    "mountain"
#> [211] "mountain" "mountain" "mountain" "mountain" "beach"    "mountain"
#> [217] "mountain" "mountain" "beach"    "mountain" "mountain" "mountain"
#> [223] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [229] "mountain" "beach"    "mountain" "mountain" "mountain" "mountain"
#> [235] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [241] "mountain" "beach"    "mountain" "mountain" "mountain" "mountain"
#> [247] "beach"    "mountain" "mountain" "mountain" "mountain" "beach"   
#> [253] "mountain" "mountain" "mountain" "beach"    "mountain" "beach"   
#> [259] "mountain" "mountain" "mountain" "mountain" "mountain" "beach"   
#> [265] "beach"    "mountain" "mountain" "mountain" "mountain" "mountain"
#> [271] "mountain" "beach"    "mountain" "mountain" "beach"    "mountain"
#> [277] "forest"   "mountain" "mountain" "beach"    "mountain" "mountain"
#> [283] "mountain" "mountain" "forest"   "mountain" "mountain" "mountain"
#> [289] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"

Evaluating the model using the confusion matrix. This model perform better than the previous model because we put more CNN layer to extract more features from the image.

confusionMatrix(data=as.factor(pred1_test), reference = as.factor(val_test_data$class))
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction beach forest mountain
#>   beach       98      0       17
#>   forest       3     75        2
#>   mountain     9      9       81
#> 
#> Overall Statistics
#>                                               
#>                Accuracy : 0.8639              
#>                  95% CI : (0.8194, 0.901)     
#>     No Information Rate : 0.3741              
#>     P-Value [Acc > NIR] : < 0.0000000000000002
#>                                               
#>                   Kappa : 0.7943              
#>                                               
#>  Mcnemar's Test P-Value : 0.01929             
#> 
#> Statistics by Class:
#> 
#>                      Class: beach Class: forest Class: mountain
#> Sensitivity                0.8909        0.8929          0.8100
#> Specificity                0.9076        0.9762          0.9072
#> Pos Pred Value             0.8522        0.9375          0.8182
#> Neg Pred Value             0.9330        0.9579          0.9026
#> Prevalence                 0.3741        0.2857          0.3401
#> Detection Rate             0.3333        0.2551          0.2755
#> Detection Prevalence       0.3912        0.2721          0.3367
#> Balanced Accuracy          0.8993        0.9345          0.8586

The model_tuned confusion matrix result shows that the model can classify beach, forest, mountain with the accuracy 86.39% on the test data set. In this case, the most important metrix is accuracy, because it represent the ability of model can classify for each classes. Furthermore, the model can be improved by modified the architecture (number of neurons, number of layers, activation function type), or modified the hyperparameters (number of ephocs, batch size, optimizer type, learning rate).


10 Convert to Excel

In this step, the images aren’t classified according to their respective folders and the test folder doesn’t divided intro three class folders.

test <- read.csv("data/image-data-test.csv")

test_data <- data.frame(file_name = paste0("data/test1/",
                                          test$id))

head(test_data, 10)
test1_x <- image_prep(test_data$file_name)

# Check dimension of testing data set
dim(test1_x)
#> [1] 294 128 128   3
pred1_test <- predict(model_tuned, test1_x) %>% k_argmax() %>% as.array()
head(pred1_test, 10)
#>  [1] 0 1 2 0 2 2 2 1 1 0
pred1_test <- sapply(pred1_test, decode) 
head(pred1_test, 10)
#>  [1] "beach"    "forest"   "mountain" "beach"    "mountain" "mountain"
#>  [7] "mountain" "forest"   "forest"   "beach"
test$label=pred1_test
head(test)
write.csv(test,"submission.csv",row.names = FALSE)

11 Conclusion

Building a deep learning model that can find the features and characteristic from the images (beach, forest, mountain). The model is Convolutional Neural Network (CNN) and performs very well due to the model has been tuned by stacking more CNN layers (modified the architectures) to extract more information from the image and modified the hyperparameters (number of ephocs, batch size, optimizer type, learning rate). Then, the goal is achieved where the tuned_model can classify the three classes (beach, forest, mountain) with high accuracy (86.39%) also small differences between test data accuracy and train data accuracy (it indicates the model is not underfit or overfit). Lastly, the potential business implementations are breast cancer image classification, self-driving cars, product defect detection, animal species detection, car plate / ship code detection, etc.